This assignment is for ETC5521 Assignment 2 based on Team emu comprising of Justin Thomas and Mayunk Bharadwaj. and revised by Abhishek Sinha and Yiwen Zhang.
Using the data provided on the ‘tidytuesday’ platform, our primary question is to identify the characteristics of a winning beach volleyball team for both males and females.
We believe that there might be differences in characteristics for a winning team compared to a losing team because of, for example, prevalence of beach volleyball in certain countries. Also, we theorize that taller and younger players may potentially be better at beach volleyball because of the competitive advantage they may have over shorter and more seasoned players.
Therefore, the secondary questions that will help us answer our primary question are:
Furthermore, We will further explore the individual qualities of individual players in the team to identify the most successful player and the most successful combination. In addition to the physiological quality (height, age) and technical factors, we will also study whether the winning team will be affected by the home advantage.
After studying the characteristics of the winning team, we will also be very curious about an interesting question. Although the winning team is likely to be a strong team (high ranking ), is there any situation that the low ranking team defeats the high ranking team? So, we add four additional questions to complete this analysis:
In the following report, the reader will be able to find a description and information about the source and limitations of the data; information on how the data was cleaned; an analysis that will answer the above questions and a conclusion.
While going through the dataset, we found that the data was incomplete because there were multiple ‘NA’ values for individual player performance statistics. As such, observations which featured ‘NA’ values had to be removed as they were unlikely to be helpful in our analysis. Due to this, the sample size will be reduced, which means that the accuracy of the research results may be affected to a certain extent.
Primary Question
What are the characteristics of a winning beach volleyball team for both males and females?
Secondary Questions
Additional Questions for Assignment 2
Looking into the FVIB circuit, is there any home advantage for winning players?
Is there any low ranking team beat higher ranking team?
What combination of the players are most successful and have teamed up for the greatest number of matches in both the volleyball circuits?
Who are the most successful players in beach volleyball and how they have evolved over time and their skills pattern?
This data set provides beach volleyball statistics for men’s and women’s matches at two major tournaments, the Fédération Internationale de Volleyball (FIVB) Beach Volleyball World Championships and the Association of Volleyball Professionals (AVP) tour. The matches are played with teams of 2. In this data set, tournament information, player information, player performance statistics and match results are recorded. The data provided ranges from September 2000 to August 2019 and it has been collected by the data recorded at the tournaments.
The original data source created by Adam Vagner had initial data recorded from September 2000 to July 2017, however it has been periodically updated with the most recent update coming in May 2020. This can be found at this website on Github.(BigTimeStats, n.d.)
The structure of the data set is:
There are 65 variables in this data set:
| Variable Name |
|---|
| circuit |
| tournament |
| country |
| year |
| date |
| gender |
| match_num |
| w_player1 |
| w_p1_birthdate |
| w_p1_age |
| w_p1_hgt |
| w_p1_country |
| w_player2 |
| w_p2_birthdate |
| w_p2_age |
| w_p2_hgt |
| w_p2_country |
| w_rank |
| l_player1 |
| l_p1_birthdate |
| l_p1_age |
| l_p1_hgt |
| l_p1_country |
| l_player2 |
| l_p2_birthdate |
| l_p2_age |
| l_p2_hgt |
| l_p2_country |
| l_rank |
| score |
| duration |
| bracket |
| round |
| w_p1_tot_attacks |
| w_p1_tot_kills |
| w_p1_tot_errors |
| w_p1_tot_hitpct |
| w_p1_tot_aces |
| w_p1_tot_serve_errors |
| w_p1_tot_blocks |
| w_p1_tot_digs |
| w_p2_tot_attacks |
| w_p2_tot_kills |
| w_p2_tot_errors |
| w_p2_tot_hitpct |
| w_p2_tot_aces |
| w_p2_tot_serve_errors |
| w_p2_tot_blocks |
| w_p2_tot_digs |
| l_p1_tot_attacks |
| l_p1_tot_kills |
| l_p1_tot_errors |
| l_p1_tot_hitpct |
| l_p1_tot_aces |
| l_p1_tot_serve_errors |
| l_p1_tot_blocks |
| l_p1_tot_digs |
| l_p2_tot_attacks |
| l_p2_tot_kills |
| l_p2_tot_errors |
| l_p2_tot_hitpct |
| l_p2_tot_aces |
| l_p2_tot_serve_errors |
| l_p2_tot_blocks |
| l_p2_tot_digs |
Our data was already in tidy format, so we did not have much cleaning to do. However in order to conduct our analysis, we have tidied the data set by removing variables that are not pertinent to answer our questions.
The methods we have used to tidy our data is as follows:
The reason for why we did not include variables such as match duration, or individual player performance statistics was because it did not fit with answering the questions we have laid out. Additionally, majority of the data for these variables were unknown, so it would not have been useful in our analysis.
| Variable | Description |
|---|---|
| circuit | Either AVP (USA) or FIVB (International) |
| country | Country where tournament played |
| year | Year of tournament |
| date | Date of match |
| gender | Gender of team |
| w_player1 | Winner player 1 Name |
| w_p1_birthdate | Winner player 1 birth date |
| w_p1_age | Winner player 1 age |
| w_p1_hgt | Winner player 1 height in inches |
| w_p1_country | Winner player country |
| w_player2 | Winner player 2 name |
| w_p2_birthdate | Winner player 2 birth date |
| w_p2_age | Winner player 2 age |
| w_p2_hgt | Winner player 2 height in inches |
| w_p2_country | Winner player 2 country |
| l_player1 | Losing player 1 name |
| l_p1_birthdate | Losing player 1 birth date |
| l_p1_age | Losing player 1 age |
| l_p1_hgt | Losing player 1 height in inches |
| l_p1_country | Losing player 1 country |
| l_player2 | Losing player 2 name |
| l_p2_birthdate | Losing player 2 birth date |
| l_p2_age | Losing player 2 age |
| l_p2_hgt | Losing player 2 height in inches |
| l_p2_country | Losing player 2 country |
| score | Match score separated by a dash and matches separated by a comma, eg 21 points to 12 points is 21-12 |
The original data is sourced from: Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball
To load the data set, we had to use a GitHub repository that had the data set. The name of this repository is “Tidy Tuesday”. The data set was sourced from this repository: Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md
For both the AVP and FIVB tournaments, a team consists of 2 players. Each player in the team either comes from the same country or they can come from different countries. Thus, in this section, our analysis focuses on finding the countries that had the most number of winning teams. This will help us find the countries that had the most winning players.
In order to find our answer to this question, we first did some data wrangling to get the data set up for analysis. Then we followed the steps outlined below:
Figure 3.1: Top 20 countries with the most winning teams
Figure 3.1 shows the top 20 countries with the most number of winning teams. We can see that the United States was the most dominating country with a total of 4200 winning teams. This means that at minimum 8400 players came from the United States and won. In distant second place, Brazil had 258 winning teams, and so 516 Brazilian players won matches where both players in the team came from Brazil. In a close third place, Germany triumphed with 200 winning teams comprising of 500 players. The remaining 17 teams in this plot ranged from having 166 winning teams to 45 winning teams.
The clear winner here is United States and we can conclude that majority of the winning players in the AVP and FIVB tournaments hail from the United States.
We decided to dig further into United States. Although there were 4200 teams where both players in each team came from the United States, there were instances were 1 player came from the United States and another player came from a different country. This following section takes a look at the different countries that partnered with the United States.
In order to find the different countries that partnered with the United States, we followed the steps outlined below:
This gave us a list of all the different country combinations where either player 1 or player 2 came from the United States and the other non-USA player’s country.
| Player 1 country | Player 2 country | Number of teams |
|---|---|---|
| United States | United States | 936 |
| United States | Brazil | 31 |
| Brazil | United States | 16 |
| United States | Australia | 16 |
| Canada | United States | 11 |
| Philippines | United States | 8 |
| United States | England | 7 |
| United States | Poland | 7 |
| United States | Azerbaijan | 6 |
| Australia | United States | 5 |
| England | United States | 5 |
| United States | Puerto Rico | 5 |
| Poland | United States | 4 |
| United States | Israel | 4 |
| Italy | United States | 3 |
| New Zealand | United States | 3 |
| Puerto Rico | United States | 3 |
| United States | Philippines | 3 |
| Israel | United States | 2 |
| Russia | United States | 2 |
Table 3.1 shows 20 different country combinations, which is only a subset of the different countries that partnered with the United States. In total there were 66 different combinations.
Apart from both players coming from the United States, 44 different teams had player 1 come from the United States and player 2 come from Brazil. 34 teams had player 1 come from Poland and player 2 come from the United States.
From looking at the rest of the table, we can see just how popular the United States is as a competing country in volleyball tournaments. It not only registers in tournaments where both players come from the United States, but it also registers where only 1 player in the team comes from the United States and partners with a player from a different country.
N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.
The average age for male winning players 1 and 2 are 29.40 and 29.32 respectively. The average age for male losing players 1 and 2, on the other hand, are 29.08 and 28.95 respectively. There is no obvious bias to winning and losing due to age - as the average age for losers and winners is about the same.
This might tell us something, however, about the average age of participation in professional male volleyball. If we plot every age of, for instance, male winning player 1 (Figure 3.2) and male losing player 2 (Figure 3.3) as examples, we see that the most commonly occurring ages are in the late 20s (28-29 year of age). Therefore, it is reasonable to infer that male volleyball players - due to the high levels of participation at those ages – hit their peak in their late 20s.
Now, let’s consider women’s volleyball. The average age for female winning players 1 and 2 are 27.98 and 28.29 respectively. The average age for female losing players 1 and 2 are 27.52 and 27.73 respectively. As was the case with the male game, age does not seem to strongly influence winning. However, it is interesting to note that their is a slight difference in average age of winning and losing players between the genders. If we take a look at the average age of winning player 2 in Figure 3.4, we can see that the average age of winning player 2 is less for females than males. Similarly, if we consider the average age of losing player 1 in Figure 3.5, we can see that the average age is also less for females than it is for males.
Figure 3.2: Ages of Male Winning Player 1
Figure 3.3: Ages of Male Losing Player 2
Figure 3.4: Ages of Winning Player 2 by gender
Figure 3.5: Ages of Losing Player 1 by gender
N.B. For the method used to complete this analysis, please refer to the commentary included within the code chunks.
The average height for female winning players 1 and 2 are 70.91 and 70.85 inches respectively. The average height for female losing players 1 and 2 are 70.62 and 70.72 inches respectively. Although the average height for the losing players is less than the height of winning players, it is not a huge difference.
The average height for male winning players 1 and 2 are 76.28 and 76.39 inches respectively, compared to the height for losing players 1 and 2 of 75.98 and 76.15 inches respectively. Consider Figures 3.6 and 3.7, which display the difference in heights between male winning and losing players 1 (Fig. 3.6) and male winning and losing player 2 (Fig. 3.7). In both situations, the means in difference in height are pretty evenly centred around 0. so we probably can’t say height difference effects winning a volleyball game. We can however say that male volleyball participants are generally taller than female volleyball participants although through common sense we know this phenomenon is not unique to just volleyball.
Figure 3.6: Difference in Heights of Male Player 1
Figure 3.7: Difference in Heights of Male Player 2
In team sports, the term home advantage describes the benefit that the home team is said to gain over the visiting team. This is because the home team will be more adaptable to the weather, temperature and other natural factors in the competition area. Additionally, there will be no jet lag problem, and there will also be a sense of security on the psychological level. Therefore, home advantage is a frequently mentioned topic in sports competitions. This time is no exception, we will also be curious whether the winning team of beach volleyball will have home advantage, which will be analyzed as followed.
Firstly, since the host country and contestant’s country in AVP competition is almost the United States, it is meaningless to discuss this issue, so I only choose the data of FVIB competition as the object. Then let’s see the home winning rate regardless of gender. I select the observations that has country where tournament played equal the country where winner is from. And then compute the number of these matches and save it as variable “num_winner”. After that, I compute the total number of matched host in every country each year and save it as variable “num_total”. Next, I join these two tables together to calculate the winning rate. Finally, I divide “num_winner” by “num_total” to get the winning rate regardless gender.
After getting the results, I make a bar plot to show this with a descending order.
Looking into Figure 3.8, it can be observed that in all the eight years, although team of United States has the highest winning rate at 35.678392%, all the winning rate at home is less than fifty percentatge, that is to say, the winning rate at home is not higher than that at away, which shows that the home advantage is not obvious in FVIB competition.
Figure 3.8: Home winning rate for all team
Although the home court advantage is not obvious for the winning team in general, is there any difference between the winning teams of different genders?
Then we comes to women’s team. On the basis of the previous part of the method, I added gender screening with screening the teams only for female gender, and calculated the home winning rate, then displayed the results in Figure 3.9. It can be seen that with the United States having the highest rate at 39.5604396%, all of the teams don’t have rates over fifty percent, quite same as the general situation. That indicates that home advantage is still not obvious in women’s team.
Figure 3.9: Home winning rate for Woman
Using the same method of screening women’s teams, the teams whose gender is only male are selected and the winning rate at home is calculated after that. The results are shown in Figure 3.10. It is quite interesting that the home advantage is also not obvious in men, even if the highest rate is reached 32.4074074% by United States.
Figure 3.10: Home winning rate for Man
General speaking, home advantage is not tenable for the winning team in FVIB tournament, regardless of gender. However, if we look further, we can find that the winning rate of women at home is higher than that of men in most countries, with the exception of Poland, where the winning rate of men at home is 9.6774194% higher than that of women’s 4.2553191%.
In addition, we can also see that no men’s team in England and Greece has ever won, which may indicate that the strength of men’s beach volleyball in these two countries is not enough or is not attached much attention to. But in any case, the United States has the most home winning rate, that is because it has the most winning teams as explained previously.
Through the analysis, we can know that the winning team may be a strong team, but is there any situation that the low ranking team defeats the higher one? First, I focus on FVIB tournament, and filter the matches in which the low ranking team defeated the higher one for women. And then compute the number of these matches and save it as variable “num_rank” . After that, I compute the total number of matched host in every country each year and save it as variable “num_total”. Next, I join these two tables together to calculate the proportion of teams with low ranking but defeating the higher one. Finally, I divide “num_rank” by “num_total” to get the results.Second, I use the same method as above to compute the rate for men.
In Figure 3.11, it can be seen that tournament in Italy has the most proportion of team defeating teams higher than them at about 63.6363636%, and the lowest tournament is in England at 47.9166667%. Most of the tournaments have over half teams defeating the higher ranked teams.
Figure 3.11: Low ranking team beat higher ranking team(Woman FVIB)
Figure 3.12 shows the results of men. The tournament in Greece had about 63.4615385% teams defeating higher ranked one. The only two that was not over fifty percent are Brazil and China with proportion at 44.4444444% and 47.9166667%. Similar to women, most tournament for men had over half team defeating higher ranked one.
However, we could also recognize that this proportion of women is greater than that of men for most countries, which indicates that the overall strength of the women’s team is stronger than that of the men’s team. But there is an outlier - England, in which the men performed better than women with a higher proportion at 50%. This may indicate that men’s beach volleyball is stronger in the UK.
Figure 3.12: Low ranking team beat higher ranking team(Man FVIB)
Then, it comes to the AVP tournament. I use the same method applied in the analysis of FVIB. But in order to display the women and men rate in one plot, I manually create a tibble that only contains the gender and the rate value. As the host country in AVP is all America, I ignore the country variable. I also draw a plot to represent the results.
In Figure 3.13, the bar chart on the left side is the rate of men with the right-side one showing the rate of women team. It can be observed that the rate of men is about 50.25072%, and the rate of women is around 51.14613%. We can see that in the AVP competition, the gender difference is not obvious. Similarly, more than half of the teams can beat the higher ranked teams.
Figure 3.13: Low ranking team beat higher ranking team (AVP)
Generally speaking, the situation that the lower ranked teams beat the higher ranked teams accounts for more than half of the total in both FVIB and AVP tournaments. Therefore, we can say that it is not uncommon for low ranking teams to beat high ranked ones in beach volleyball. It also indirectly indicates that ranking in beach volleyball competition may not fully reflect the strength and winning rate of a team.
After our analysis, we have concluded that a typical winning male volleyball team most likely has both players originating from the United States, with player one having an average age of 29.40 and an average height of 76.28 inches with player two having an average age of 29.32 and an average height of 76.39 inches.
In addition, a typical winning female volleyball team most likely has both players originating from the United States, with player one having an average age of 27.98 and an average height of 70.91 inches with player two having an average age of 28.29 and an average height of 70.85 inches.
Thanks for the contributors of these packages:
ggpplot2 (Wickham 2016)
tidyverse (Wickham et al. 2019)
kableExtra (Zhu 2019)
bookdown (Xie 2020)
gridExtra (Auguie 2017)
plotly (Sievert 2020)
Mock, J. (2020, May 19). rfordatasciene/tidytuesday. Retrieved August 22, 2020, from https://github.com/rfordatascience/tidytuesday/blob/master/data/2020/2020-05-19/readme.md
R Core Team (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
Sievert, C. (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC Florida.
Vagner, A. (2020, July 20). BigTimeStats/beach-volleyball. Retrieved August 22, 2020, from https://github.com/BigTimeStats/beach-volleyball
Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686
Zhu, H. (2019). kableExtra: Construct Complex Table with ‘kable’ and Pipe Syntax. R package version 1.1.0. https://CRAN.R-project.org/package=kableExtra
Auguie, Baptiste. 2017. GridExtra: Miscellaneous Functions for "Grid" Graphics. https://CRAN.R-project.org/package=gridExtra.
BigTimeStats. n.d. “BigTimeStats/Beach-Volleyball.” GitHub. https://github.com/BigTimeStats/beach-volleyball.
Sievert, Carson. 2020. Interactive Web-Based Data Visualization with R, Plotly, and Shiny. Chapman; Hall/CRC. https://plotly-r.com.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Xie, Yihui. 2020. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.
Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.